Supplementary data

This page contains all data used in Nature publication "Gene expression divergence recapitulates the developmental hourglass".


Contents
1 Raw data
2 Probe to gene mappings (FlyBase 5.14)
3 Normalization, Averaging and Scaling
4 Correlation Analysis
5 Probe Analysis: GC content and overlaps

Raw data

Species Replicate

 01

0-2h

02

2-4h

03

4-6h

04

6-8h

05

8-10h

06

10-12h

07

12-14h

08

14-16h

09

16-18h

10

18-20h

11

20-22h

12

22-24h

13

24-26h

D.mel  1 x x x x x x x x




Dmel.zip
2 x x x x x x x x





3 x x x x x x x x





4 x x x x x x x x





5 x x x x x x x x x x



6




x x x x x



7




x x x x




8




x x x x x


D.sim 1 x x x x x x x x x



Dsim.zip
2 x x x x x x x x x




3 x x x x x x x x x



D.ana 1 x x x x x x x x x



Dana.zip
2 x x x x x x x x x




3 x x x x x x x x x



D.pse 1 x x x x x x x x x



Dpse.zip
2 x x x x x x x x x




3 x x x x x x x x x



D.per 1 x x x x x x x x x



Dper.zip
2 x x x x x x x x x




3 x x x x x x x x x



D.vir 1 x x x x x x x x x x


Dvir.zip
2 x x x x x x x x x x x x x

3 x x x x x x x x x x x x x

Probe to gene mappings (FlyBase 5.14)

This file lists genes intended as targets by probes and the genes that those probes actually map to based on Flybase 5.14.

Normalization, Averaging and Scaling

Nr.
File description File link notes
1
all species, replicates, genes and probes raw All species replicate probelog10.txt.zip
log10 of the gProcessedSignal from Agilent datafiles
2
all species, replicates, genes and probes quantile normalized per timepoint All species replicate quantilenormalized probelog10.txt.zip
for each timepoint separetely the x (usually 3) replicates for one species were quantile normalized (bringing the comparable distributions of signals across timepoints to a common distribution). Quantile normalization was perfomed on raw gProcessedSignal data and then logged.
3
all species, genes and probes averaged across replicates All species avgprobe log10.txt.zip
Data from 2 were averaged across replicates (rep1+rep2+rep3/3). Again, data were logged only after averaging for output purposes.
4
all species and genes averaged across probes & row-normalized All species gene rownorm.txt.zip
Data from 3 were logged, averaged per-timepoint across probes using Tukey Biweight Average (removing outliers) and row-normalized (i.e. converted to deviations from the mean centered on 0)
5
all species and genes scaled
All species gene scaled relative to amel.txt.zip

Scaling factors relative to amel, estimated from the data, were applied to the row-normalized gene data from 4.

This is an input for correlation analysis.

6
all species, replicates, genes and probes quantile normalized and scaled
All species replicate probelog10 quantileNormalized scaled.txt.zip

Data from 2 were scaled according to precomputed scaling factors. That is quantile normalized data for each probe, gene, replicate and species were scaled (for dmel the data do not change compared to row 2 as the scaling factor=1).

This is an input for ANOVA analysis.

Correlation Analysis

We calculated pair-wise (all species combinations) correlation coefficients from the scaled, row-normalized gene profiles (All pairwise correlations.zip). 


Probe Analysis: GC content and overlaps

Nr.
File description File link notes
1
Probes: entropy, quality, blast, GC%. Allprobes GC content.zip
A list of all the probes together with their sequences, entropy, blast and quality scores, and their GC percentage.
2
Overlaps between probes within genes. Overlaps basepairs bygene.zip
Overlaps in base-pairs for all pairwise comparisons between probes within genes (4 per gene since all species have the same overlap scores for their probes).
3
Relationship between GC content and expression intensity. GC intens scatter hist.zip
A plot showing the relationship between GC content and mean expression intensity for all probes.
4
Relationship between GC content contrasts and p-values. GC pval.zip
A plot showing the relationship between GC content contrasts between species and the associated bootstrapped p-values.
5
GC content for species-specific probes. GC species box.zip
A plot showing the GC content for probes within each species. The plot shows that simulans has the highest GC content and virilis the lowest.
6
Relationship between GC variance within and GC variance between species. GC within between.pdf.zip
This plot shows that there is a slight tendency for probes with low variance within species to show high variance between species and vice versa.
7
Distribution of probe overlaps for all pairwise probe comparisons. Probe overlap all hist.zip
Histogram showing the distribution of the fraction of probe overlap for all 6 pairwise comparisons for the 4 probes per gene.
8
Distribution of probe overlaps for neighbouring probe comparisons. Probe overlap hist.zip
Histogram showing the distribution of the fraction of probe overlap for neighbouring probes.
9
Relationship between probe overlap and variance in GC content between species. Overlap varGCbetween.zip
This plot shows the relationship between probe overlap and variance in GC content between species showing that more overlapping probes tend to have more variance between species.